Manual Corpus Annotation: Giving Meaning to the Evaluation Metrics

نویسندگان

  • Yann Mathet
  • Antoine Widlöcher
  • Karën Fort
  • Claire François
  • Olivier Galibert
  • Cyril Grouin
  • Juliette Kahn
  • Sophie Rosset
  • Pierre Zweigenbaum
چکیده

Computing inter-annotator agreement measures on a manually annotated corpus is necessary to evaluate the reliability of its annotation. However, the interpretation of the obtained results is recognized as highly arbitrary. We describe in this article a method and a tool that we developed which “shuffles” a reference annotation according to different error paradigms, thereby creating artificial annotations with controlled errors. Agreement measures are computed on these corpora, and the obtained results are used to model the behavior of these measures and understand their actual meaning.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Results from the ML4HMT Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid MT

We describe the ML4HMT shared task which aims to foster research on improved system combination approaches for MT. Participants of the challenge are requested to build hybrid translations by combining the output of several MT systems of different types. We describe the ML4HMT corpus and the annotation format we have designed for it and briefly summarize the participating systems. Using automate...

متن کامل

Annotating Attributions And Private States

This paper describes extensions to a corpus annotation scheme for the manual annotation of attributions, as well as opinions, emotions, sentiments, speculations, evaluations and other private states in language. It discusses the scheme with respect to the “Pie in the Sky” Check List of Desirable Semantic Information for Annotation. We believe that the scheme is a good foundation for adding priv...

متن کامل

The ML4HMT Workshop on Optimising the Division of Labour in Hybrid Machine Translation

We describe the “Shared Task on Applying Machine Learning Techniques to Optimise the Division of Labour in Hybrid Machine Translation” (ML4HMT) which aims to foster research on improved system combination approaches for machine translation (MT). Participants of the challenge are requested to build hybrid translations by combining the output of several MT systems of different types. We first des...

متن کامل

Annotating Expressions of Opinions and Emotions in Language

This paper describes a corpus annotation project to study issues in the manual annotation of opinions, emotions, sentiments, speculations, evaluations and other private states in language. The resulting corpus annotation scheme is described, as well as examples of its use. In addition, the manual annotation process and the results of an inter-annotator agreement study on a 10,000-sentence corpu...

متن کامل

Transitions thématiques : Annotation d'un corpus journalistique et premières analyses (Manual thematic annotation of a journalistic corpus : first observations and evaluation) [in French]

Manual thematic annotation of a journalistic corpus : first observations and evaluation. The work presented in this paper focuses on the creation of a corpus of journalistic texts annotated at dicourse level, more precisely on a topic level. The annotation model is a classic segmentation one, to which we add transition zones between topical units. We assume that in a well-structured text, the a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012